Allocentric Object Tracking 2
نویسندگان
چکیده
This study tested whether multiple-object tracking—the ability to visually index objects based on their spatial-temporal history—is scene-based or image-based. Initial experiments showed equivalent tracking accuracy for objects in 2D and 3D motion. Subsequent experiments manipulated the speed of object motion independently from the motion speed of the scene as a whole. Results showed that tracking accuracy was influenced by object speed but not scene speed. This held whether the scene underwent translation, zoom, rotation, or even a combination of all three motions, which we termed the ‘wild ride.’ A final series of experiments interfered with observers’ ability to see a coherent 3D scene: observers tracked objects moving at different speeds (multiple speeds reduces scene coherence) or tracked objects moving at identical retinal speeds but in distorted 3D space. These manipulations reduced tracking accuracy, confirming that tracking is accomplished using a scene-based (allocentric) frame of reference. Allocentric Object Tracking 3 Multiple object tracking is based on scene, not retinal, coordinates An important task of the visual system is to keep track of objects as they move through space. Whether the observer is an air traffic controller tracking airplanes on a radar screen, or an athlete tracking team members and opposing players on a field, there is a need to maintain a visual index for objects that are changing in their spatial location over time. It has been shown that human observers can track up to four or five randomly moving objects with fairly good accuracy (Pylyshyn & Storm, 1998; Scholl, 2001; Yantis, 1992). Tracking performance is high even when the tracked objects are identical to untracked objects in all respects other than their motion paths, pointing to a tracking ability that is based solely on spatiotemporal history of the objects. The ability to track multiple objects has been used to investigate the role of perceptual organization in tracking (Yantis, 1992), whether attention can be deployed in depth as readily as it can be deployed across vertical and horizontal space (Viswanathan & Mingolla, 1998) and the nature of visual object representations (Scholl & Pylyshyn, 1999). However, there is a fundamental question concerning tracking that is not yet well understood. In what frame of reference are objects being tracked? Are the visual indices or ‘pointers’ that observers use to track objects pointing to locations on a retinotopic map (a coordinate system with respect to the retina), or are they pointing to locations in an allocentric map (a coordinate system with respect to the scene)? Although there is not yet any research on this question, there are good reasons to suspect that either one of these options may be correct. One reason to suspect a retinal frame of reference is because the entire human visual system is organized at the physiological level in a retinotopic fashion. When neurons in one visual area of the brain (e.g., area V1) communicate with neurons in other areas (e.g., V5 or temporal lobe), they tend to maintain a strict spatial correspondence. This means that neurons in different visual areas that are responding to the same object in a given visual field location are automatically linked, simply by virtue of their common reference to the same visual field location (Van Essen et al, 2001). A mechanism that was designed to ‘keep a finger’ on an object as it moved over time would simply have Allocentric Object Tracking 4 to track the changing neural activity in one of these retinotopically organized visual areas. Yet there are equally compelling reasons to suspect that tracking is accomplished using a reference frame tied to locations in the world rather than in the eye. One such reason comes from an examination of eye movements. Saccades, those high-speed ballistic eye movements that are made from one location to another, are referenced to stationary environmental landmarks rather than to specific retinal coordinates. This is evident when small changes are made to the locations of saccadic targets while the eye is en route to the target; the eye automatically corrects for these changes in location even when observers are unaware that the target has moved (Deubel, Bridgeman & Schneider, 1998). Smooth pursuit eye movements are also linked to environmental rather than retinal locations, as can be seen when a moving object is tracked while simultaneously rocking ones’ head back and forth (Raymond, Shapiro & Rose, 1984). Studies of change blindness in scene perception tell a similar story. Large changes made to a scene during a brief interruption often go unnoticed by observers, provided that the overall gist and layout of the scene remains intact (Henderson & Hollingworth, 2002). These and many other psychophysical studies suggest that visual perception is geared toward registering the position of objects in the environment rather than registering objects with respect to their retinal location (Fecteau, Chua, Franks & Enns, 2001; Liu, Healey & Enns, 2003; Li & Warren, 2000). The goal of the present study was to determine whether multiple object tracking is based on retinal or allocentric coordinates. Our approach began with the longstanding observation that tracking accuracy varies systematically with object speed: objects moving at a slower speed are generally tracked more accurately than objects moving at a higher speed (Pylyshyn & Storm, 1988; Yantis, 1992). However, in these studies retinal motion and scene motion are confounded. In the present study we varied the speed of object motion relative to the center of the scene (allocentric speed) separately from the speed of motion of the scene relative to the viewing frame (retinal speed). The prediction for a retinal-based tracking mechanism is that tracking accuracy should vary directly as a function of the speed with which the objects transit the eye, regardless of Allocentric Object Tracking 5 their relative speed of movement within the scene. However, if tracking is based on an allocentric frame of reference, then accuracy should vary most directly with the speed of objects within the scene, and retinal speed should not matter. Overview of Experiments In Experiment 1, the tracking accuracy for objects moving within the confines of a twodimensional (2D) rectangle was compared with objects moving within a depicted threedimensional (3D) box. In both conditions, the speed of the moving objects was varied to determine the sensitivity of tracking accuracy to changes in retinal speed. The results showed that objects could be tracked equally well in both situations, with a small tendency for tracking to be even more accurate in the 3D display. Most critically for the remaining experiments, tracking accuracy declined systematically with increases in object speed. In Experiment 2, tracking accuracy for objects within the 3D box was measured while the box as a whole underwent a ‘wild ride,’ consisting of dynamic and simultaneous translations in the picture plane, rotations in depth around the vertical plane, and dilations and contractions in depth. That is, in addition to varying the relative speed of objects within the 3D box, the motion of the whole box varied in a complex way. Yet the results showed clearly that tracking accuracy was unaffected by these global variations in scene motion. Only the motion of the objects relative to the scene as a whole influenced tracking accuracy. In Experiment 3 most of the pictorial support for the 3D box was removed from the display, in order to see to what extent the perception of a stable scene depended on the wire frame and the checkerboard ground plane that had been used to convey the layout of the scene. The results showed that tracking accuracy was unaffected by the removal of these cues to the third dimension. This suggested that the movement of the objects themselves, within the confines of the depicted 3D box, were sufficient to provide the ‘structure from motion’ necessary to perceive the layout of the 3D scene. Allocentric Object Tracking 6 In Experiments 4 and 5 the allocentric tracking hypothesis was tested by attempts to reduce the perceived coherence of the 3D structure. In Experiment 4, the objects to be tracked moved at two different speeds within the same scene, thereby sharply reducing both the coherence of the 3D scene and tracking accuracy. Scene coherence was reduced in Experiment 5 by projecting the image of the scene onto the junction of two dihedral surfaces. Even though the retinal projection for the observer in this condition was identical to the conditions in which tracking accuracy had been high (Experiments 2 and 3), tracking accuracy was reduced along with the reduced coherence of the scene. Taken together, these results provide strong support for the view that multiple object tracking is accomplished using an allocentric frame of reference. Experiment 1: Baseline Tracking Performance The purpose of Experiment 1 was to establish several important baseline measurements for the experiments that followed. First, because the displays in all the subsequent experiments depicted objects moving in a 3D scene, we sought to compare tracking accuracy in 2D and 3D displays as directly as possible. Previous studies have reported that multiple object tracking is not impaired in accuracy when objects disappear briefly as they pass behind occluding surfaces (Scholl & Pylyshyn, 1999). Studies using the additional cue of binocular disparity have reported improved tracking accuracy relative to control displays without this cue (Viswanathan & Mingolla, 2002). Tracking accuracy is also improved when the moving objects are distributed across two planes in depth rather than when they are moving in only a single plane (Viswanathan & Mingolla, 2002). Our goal in Experiment 1 was therefore to provide as rich a 3D environment as possible, using only pictorial and motion cues for depth, and to compare tracking under these conditions with the ‘standard’ case of tracking on a 2D screen. To manipulate motion relative to an allocentric frame in the present study, we depicted the objects moving within a 3D box defined by a wire frame and a checkerboard floor, as shown in Figure 1. To help reinforce the perception of 3D motion we added the depth cue of dynamic changes in relative size. When objects were closest to the viewer they subtended 0.5° of visual angle and when they were farthest away they subtended Allocentric Object Tracking 7 0.8° degrees. At intermediate depth locations their size varied smoothly between these extreme values. -------------Insert Figure 1 about here ---------------Our second goal was to establish tracking accuracy when the objects in the scene were moving at various rates of speed. In Experiment 1, all the objects moving in a display moved at the same speed, but the rate of speed on any given trial was 1°, 2°, or 6° per second. This was a sufficiently large variation in speed to have a large influence on tracking accuracy. Our third goal was to measure the decline in tracking accuracy as the number of objects was increased, thereby allowing us to obtain a stable measure of tracking ‘capacity’ for every condition that was tested (Pashler, 1988). In Experiment 1, a total of 16 moving objects were present in each display, but the number designated as targets varied randomly from 2, 4, or 6. This turned out to be a large enough range to observe tracking accuracy that was near perfect in some cases and near chance in others.
منابع مشابه
Remapping attention in multiple object tracking
Which coordinate system do we use to track moving objects? In a previous study using smooth pursuit eye movements, we argued that targets are tracked in both retinal (retinotopic) and scene-centered (allocentric) coordinates (Howe, Pinto, & Horowitz, 2010). However, multiple object tracking typically also elicits saccadic eye movements, which may change how object locations are represented. Obs...
متن کاملUsing a Novel Concept of Potential Pixel Energy for Object Tracking
Abstract In this paper, we propose a new method for kernel based object tracking which tracks the complete non rigid object. Definition the union image blob and mapping it to a new representation which we named as potential pixels matrix are the main part of tracking algorithm. The union image blob is constructed by expanding the previous object region based on the histogram feature. The pote...
متن کاملMultiple-object tracking is based on scene, not retinal, coordinates.
This study tested whether multiple-object tracking-the ability to visually index objects on the basis of their spatiotemporal history-is scene based or image based. Initial experiments showed equivalent tracking accuracy for objects in 2-D and 3-D motion. Subsequent experiments manipulated the speeds of objects independent of the speed of the scene as a whole. Results showed that tracking accur...
متن کاملConvolutional Gating Network for Object Tracking
Object tracking through multiple cameras is a popular research topic in security and surveillance systems especially when human objects are the target. However, occlusion is one of the challenging problems for the tracking process. This paper proposes a multiple-camera-based cooperative tracking method to overcome the occlusion problem. The paper presents a new model for combining convolutiona...
متن کاملAuthor's Personal Copy the Coordinate Systems Used in Visual Tracking
Tracking moving objects is a fundamental attentional operation. Here we ask which coordinate system is used to track objects: retinal (retinotopic), scene-centered (allocentric), or both? Observers tracked three of six disks that were confined to move within an imaginary square. By moving either the imaginary square (and thus the disks contained within), the fixation cross, or both, we could dr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005